Mis-recognized utterance detection using hierarchical language model
نویسندگان
چکیده
In this paper, a mis-recognized utterance detection and modification scheme is proposed to recover speech recognition errors in speech translation. In a speech recognition stage, mis-recognition is frequently observed. The most of mis-recognitions result from mis-match of acoustic models and out-of-vocabulary (OOV) words. To cope with both acoustic model mis-match and OOVs, we adopt a hierarchical language model to identify them. A hierarchical language model can generate both hypotheses with and without OOVs (or acoustic mis-matched words). Likelihood difference of these hypotheses is used as utterance confidence measure. To confirm the possibility of this scheme, as a first experiment, we have conducted speech recognition experiments and mis-recognized utterance detection. Experiment results showed 99% detection rate for utterances with OOVs. This rate is considerably higher than 94% of a conventional detection method using a-posteriori probability. The rate of 80%, which is comparable to a conventional method were obtained for the utterances without OOVs. These results support the possibility of the proposed error detection and modification scheme.
منابع مشابه
Exploring Features For Localized Detection of Speech Recognition Errors
We address the problem of localized error detection in Automatic Speech Recognition (ASR) output to support the generation of targeted clarifications in spoken dialogue systems. Localized error detection finds specific mis-recognized words in a user utterance. Targeted clarifications, in contrast with generic ‘please repeat/rephrase’ clarifications, target a specific mis-recognized word in an u...
متن کاملMiscommunication handling in spoken dialog systems based on error-aware dialog state detection
With the exponential growth in computing power and progress in speech recognition technology, spoken dialog systems (SDSs) with which a user interacts through natural speech has been widely used in human-computer interaction. However, error-prone automatic speech recognition (ASR) results usually lead to inappropriate semantic interpretation so that miscommunication happens easily. This paper p...
متن کاملSpoken Interface for Correcting Phoneme Recognition Errors in Learning of Unknownwords
This paper describes a novel method that enables users to teach systems the phoneme sequences of new words through speech interaction. Using the method, users can correct mis-recognized phoneme sequences incrementally by making corrective utterances. Each corrective utterance may include the whole or a segment of the word. During the interaction, if the correction using the utterance results in...
متن کاملDialogue Speech Recognition by Combining Hierarchical Topic Classification and Language Model Switching
An efficient, scalable speech recognition architecture combining topic detection and topic-dependent language modeling is proposed for multi-domain spoken language systems. In the proposed approach, the inferred topic is automatically detected from the user’s utterance, and speech recognition is then performed by applying an appropriate topic-dependent language model. This approach enables user...
متن کاملMis-recognized Utterance Detection Usin Generated by Clustere
This paper proposes a new method of detecting mis-recognized utterances based on a ROVER-like voting scheme. Although the ROVER approach is effective in improving recognition accuracy, it has two serious problems from a practical point of view: 1) it is difficult to construct multiple automatic speech recognition (ASR) systems, 2) the computational cost increase according to the number of ASR s...
متن کامل